Exploring The Use Of Hybrid Similarity Measure For Author Name Disambiguation
نویسنده
چکیده
Name disambiguation has become one of the hard to crack problem in a virtual setup. With each passing day more and more entities with identical features are emerging online making it quite difficult to distinguish them. Digital libraries face similar problems in differentiating publications of similar looking authors. This leads to incorrect attribution of publications, thus making the entire effort of indexing publications of individual authors ineffective. This paper proposes a two stage hybrid similarity computation mechanism that combines the best of both the worlds. The proposed method use a token-based similarity score in this first stage of comparison and based on the results of the first stage it uses a character-based similarity score in the second stage. Experimental results obtained on standard datasets indicate that the proposed technique shows a lot of improvements over the existing methods.
منابع مشابه
بهبود صحت ابهامزدایی نام نویسنده با استفاده از خوشهبندی تجمّعی
Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and...
متن کاملAuthor Name Disambiguation Using a New Categorical Distribution Similarity
Author name ambiguity has been a long-standing problem which impairs the accuracy of publication retrieval and bibliometric methods. Most of the existing disambiguation methods are built on similarity measures, e.g., “Jaccard Coefficient”, between two sets of papers to be disambiguated, each set represented by a set of categorical features, e.g., coauthors and published venues. Such measures pe...
متن کاملA Template Based Hybrid Model for Chinese Personal Name Disambiguation
This paper proposes a template based hybrid model for Chinese Personal Name Disambiguation (CPND). The template makes use of the features of personal role such as discriminating personal name (nickname, stage name), together with the specific context of most frequent words, personal name nearest words named entities, date and time that are effective for this disambiguation task, as well as surr...
متن کاملMerging error analysis of name disambiguation based on author similarity
Falsely identifying different authors as one is called merging error in the name disambiguation of coauthorship networks. Research on the measurement and distribution of merging errors helps to collect high quality coauthorship networks. In the aspect of measurement, we provide a Bayesian model to measure the errors through author similarity. We illustratively use the model and coauthor similar...
متن کاملScaling Author Name Disambiguation with CNF Blocking
An author name disambiguation (AND) algorithm identifies a unique author entity record from all similar or same publication records in scholarly or similar databases. Typically, a clustering method is used that requires calculation of similarities between each possible record pair. However, the total number of pairs grows quadratically with the size of the author database making such clustering...
متن کامل